OpenAI Unveils New AI Models: o3 and o4-mini

OpenAI has introduced two new AI models, o3 and o4-mini. What sets these models apart is that, rather than responding immediately, they first pause to reflect—engaging in an internal reasoning process to deliver more logical and accurate answers.
The o3 model emerges as OpenAI’s most powerful “thinking” AI to date. It appears to outperform previous versions across a wide range of tasks—from mathematics and coding to logical reasoning, scientific analysis, and visual interpretation. Meanwhile, the o4-mini model offers a more cost‑effective alternative, striking a balance between speed and performance. OpenAI has confirmed that both models can leverage the built‑in ChatGPT tools for web browsing, Python code execution, image processing, and image generation, and that they can incorporate analysis of user‑uploaded images directly into their response workflow.

These new models are available immediately to subscribers of the ChatGPT Pro, Plus, and Team plans. In addition, OpenAI has rolled out an o4-mini-high variant, which allocates extra compute time to further boost response accuracy.
In internal evaluations using the SWE‑bench test (which measures coding proficiency), o3 achieved a score of 69.1%, while o4-mini scored 68.1%. Beyond their enhanced visual processing capabilities, both models can execute Python code directly in the browser and perform live internet searches to access the latest information.
For developers, OpenAI offers integration via the Chat Completions API and the Responses API. Pricing is set at $10 per million input tokens and $40 per million output tokens for the o3 model; the o4-mini model carries the same rates as its predecessor, o3-mini.
Looking ahead, OpenAI plans to launch an o3-pro model exclusively for ChatGPT Pro subscribers in the coming weeks. CEO Sam Altman noted that o3 and o4-mini may be the final standalone reasoning‑focused models released before GPT-5.










Interesting to see both o3 and o4-mini announced together—feels like a strategic move to offer scalable solutions for different user needs. I’m curious how these models compare in terms of performance and where each might shine.